Skip to contents

Built with R 4.2.2


The basic premise behind any form of redistribution is that a given set of observation can be translated to some superset or subset.

This means the basic requirements for redistribution are:

  1. a source with observations
  2. a target without observations
  3. a map between source and target entities

The map associates a given source ID to a given target ID. If one source ID maps onto many target IDs, then we are disaggregating the source observation (translating one observation to many). If each source ID maps into a single target ID (and potentially many source IDs are mapping onto that target ID), then we are aggregating the source observations (translating many observations to one).

Basic Example

Say we have 1 set of 5 regions:

regions <- data.frame(id = 1:5)
regions
#>   id
#> 1  1
#> 2  2
#> 3  3
#> 4  4
#> 5  5

Disaggregate

If we have a single observation for the entire set, we could disaggregate it to the regions. With no additional information, our best guess for the value of each region is a proportional split – in this case, one fifth of the observed value for each region:

# install if needed: remotes::install_github("uva-bi-sdad/redistribute")
library(redistribute)

set_value <- 1
(redistribute(set_value, regions, verbose = TRUE))
#> 
[36mℹ
[39m source IDs: 
[32m
[32m1
[32m
[39m
#> 
[36mℹ
[39m target IDs: 
[32m
[32mid
[32m
[39m column of `target`
#> 
[36mℹ
[39m map: all target IDs for single source
#> 
[36mℹ
[39m weights: 
[32m
[32m1
[32m
[39m
#> 
[36mℹ
[39m redistributing 
[32m
[32m1
[32m
[39m variable from 
[32m
[32m1
[32m
[39m source to 
[32m
[32m5
[32m
[39m targets:
#> 
[36m•
[39m (numb; 1) V1
#> 

[36mℹ
[39m disaggregating...


[32m✔
[39m done disaggregating 
[38;5;249m[8ms]
[39m
#>   id  V1
#> 1  1 0.2
#> 2  2 0.2
#> 3  3 0.2
#> 4  4 0.2
#> 5  5 0.2

Or, maybe we know a little more about the regions, such as their size; then we could use that information to adjust the proportion allotted to each region:

region_values <- redistribute(
  set_value, regions,
  weight = c(1, 10, 10, 20, 50), verbose = TRUE
)
#> 
[36mℹ
[39m source IDs: 
[32m
[32m1
[32m
[39m
#> 
[36mℹ
[39m target IDs: 
[32m
[32mid
[32m
[39m column of `target`
#> 
[36mℹ
[39m map: all target IDs for single source
#> 
[36mℹ
[39m weights: `weight` vector
#> 
[36mℹ
[39m redistributing 
[32m
[32m1
[32m
[39m variable from 
[32m
[32m1
[32m
[39m source to 
[32m
[32m5
[32m
[39m targets:
#> 
[36m•
[39m (numb; 1) V1
#> 

[36mℹ
[39m disaggregating...


[32m✔
[39m done disaggregating 
[38;5;249m[7ms]
[39m
region_values
#>   id         V1
#> 1  1 0.01098901
#> 2  2 0.10989011
#> 3  3 0.10989011
#> 4  4 0.21978022
#> 5  5 0.54945055

Aggregate

If we have observations for all regions, we could aggregate to a single value for the set. In this case, we can re-aggregate what we initially disaggregated, to recover the original observation:

(redistribute(region_values, verbose = TRUE))
#> 
[36mℹ
[39m source IDs: 
[32m
[32mid
[32m
[39m column of `source`
#> 
[36mℹ
[39m target IDs: 
[32m
[32m1
[32m
[39m
#> 
[36mℹ
[39m map: all source IDs to a single target
#> 
[36mℹ
[39m weights: 
[32m
[32m1
[32m
[39m
#> 
[36mℹ
[39m redistributing 
[32m
[32m1
[32m
[39m variable from 
[32m
[32m5
[32m
[39m sources to 
[32m
[32m1
[32m
[39m target:
#> 
[36m•
[39m (numb; 1) V1
#> 

[36mℹ
[39m aggregating...


[32m✔
[39m done aggregating 
[38;5;249m[7ms]
[39m
#>   id V1
#> 1  1  1

Applied Example

One use case for redistribution is converting demographics data between geographic layers.

For illustration, we can look at U.S. Census data in Fairfax, Virginia:

base_dir <- "~/Downloads"
# remotes::install_github("uva-bi-sdad/catchment")
library(catchment)
library(sf)

# download population and data shapes
population <- download_census_population(base_dir, "VA", 2020)$estimates
#>  downloading Virginia American Community Survey summary files
#>  decompressing Virginia American Community Survey summary files
#>  reformatting Virginia American Community Survey summary files
#>  writing Virginia files
#>  wrote file:
#>  
population <- population[grepl("^51(?:059|600)", population$GEOID), ]
population[, -1] <- vapply(
  population[, -1], as.numeric, numeric(nrow(population))
)
rownames(population) <- population$GEOID
block_groups <- st_transform(
  download_census_shapes(base_dir, "VA", "bg", year = 2020), "WGS84"
)
#> Writing layer `cb_2020_51_bg_500k' to data source 
#>   `../../introduction/cb_2020_51_bg_500k.geojson' using driver `GeoJSON'
#> Writing 5951 features with 11 fields and geometry type Multi Polygon.
block_groups <- block_groups[block_groups$GEOID %in% population$GEOID, ]
population <- population[block_groups$GEOID, ]
st_geometry(population) <- block_groups$geometry
population$Overall <- population$TOTAL.POPULATION_Total

# prepare a map
library(leaflet)
pal <- colorNumeric(scico::scico(255, palette = "lajolla"), population$Overall)
map <- leaflet(
  population,
  options = leafletOptions(attributionControl = FALSE)
) |>
  addProviderTiles("CartoDB.Positron") |>
  addScaleBar("bottomleft") |>
  addControl("Total Population", "topright") |>
  addLayersControl(
    position = "topleft", overlayGroups = "Block Groups",
    baseGroups = c(
      "Zip Codes", "Parcels -> Zip Codes", "Block Groups -> Zip Codes",
      "Block Groups -> Parcels -> Zip Codes"
    )
  ) |>
  addLegend(
    "bottomright", pal, ~Overall,
    title = "Block Groups", opacity = 1
  ) |>
  addPolygons(
    fillColor = ~ pal(Overall), fillOpacity = 1, weight = 1,
    color = "#000", highlightOptions = highlightOptions(color = "#fff"),
    group = "Block Groups", label = ~ paste(
      GEOID, "Population:", round(Overall, 3)
    )
  )

This population information is provided at the Census Block Group level at the lowest, but we might want to look at population within zip codes.

We could try aggregating directly from block groups to zip codes, which involves calculating proportional intersections between region polygons:

# Download shapes if needed
zipcode_file <- paste0(base_dir, "/zipcode_va_fairfax.geojson")
if (!file.exists(zipcode_file)) {
  download.file(paste0(
    "https://www.fairfaxcounty.gov/euclid/rest/services/IPLS/IPLSMap",
    "/MapServer/3/query?where=1=1&outFields=*&outSR=4326&f=geojson"
  ), zipcode_file)
}
zipcodes <- read_sf(zipcode_file)

# redistribute population data from block groups
zipcode_population <- redistribute(
  population, zipcodes,
  target_id = "ZIPCODE", verbose = TRUE
)
#> 
[36mℹ
[39m source IDs: 
[32m
[32mGEOID
[32m
[39m column of `source`
#> 
[36mℹ
[39m target IDs: 
[32m
[32mZIPCODE
[32m
[39m column of `target`
#> 
[36mℹ
[39m map: intersections between geometries
#> 

[36mℹ
[39m mapping...


[32m✔
[39m done mapping 
[38;5;249m[7.1s]
[39m
#> 
[36mℹ
[39m weights: 
[32m
[32m1
[32m
[39m
#> 
[36mℹ
[39m redistributing 
[32m
[32m116
[32m
[39m variables from 
[32m
[32m693
[32m
[39m sources to 
[32m
[32m80
[32m
[39m targets:
#> 
[36m•
[39m (numb; 116)
#> 

[36mℹ
[39m aggregating...


[32m✔
[39m done aggregating 
[38;5;249m[9ms]
[39m

We could also disaggregate to the parcel level, which are point locations, then aggregate to zipcodes:

# Download shapes if needed
## https://data-fairfaxcountygis.opendata.arcgis.com/datasets/current-population
parcel_file <- paste0(base_dir, "/parcel_va_fairfax.geojson")
if (!file.exists(parcel_file)) {
  download.file(paste0(
    "https://opendata.arcgis.com/api/v3/datasets/",
    "314bfe4019754952a715be3a33384d9d_0/downloads/data",
    "?format=geojson&spatialRefId=4326&where=1=1"
  ), parcel_file)
}
parcels <- read_sf(parcel_file)

# disaggregate population data from block groups to parcels
bg_parcel_population <- redistribute(
  population, parcels,
  weight = "CURRE_POPUL", verbose = TRUE
)
#> 
[36mℹ
[39m source IDs: 
[32m
[32mGEOID
[32m
[39m column of `source`
#> 
[36mℹ
[39m target IDs: sequence, assuming map from geometries
#> 
[36mℹ
[39m map: intersections between geometries
#> 

[36mℹ
[39m mapping...


[32m✔
[39m done mapping 
[38;5;249m[2.2s]
[39m
#> 
[36mℹ
[39m weights: 
[32m
[32mCURRE_POPUL
[32m
[39m column of `target`
#> 
[36mℹ
[39m 
[32m
[32m32
[32m
[39m of 
[32m
[32m336407
[32m
[39m target IDs were dropped because they were not present in `map`
#> 
[36mℹ
[39m redistributing 
[32m
[32m116
[32m
[39m variables from 
[32m
[32m693
[32m
[39m sources to 
[32m
[32m336375
[32m
[39m targets:
#> 
[36m•
[39m (numb; 116)
#> 

[36mℹ
[39m disaggregating...


[32m✔
[39m done disaggregating 
[38;5;249m[2.6s]
[39m
#> 
[36mℹ
[39m realigning with original target IDs

# then aggregate from parcels to zip codes
bg_parcel_zipcode_population <- redistribute(
  bg_parcel_population, zipcodes,
  source_id = "id", target_id = "ZIPCODE", verbose = TRUE
)
#> 
[36mℹ
[39m source IDs: 
[32m
[32mid
[32m
[39m column of `source`
#> 
[36mℹ
[39m target IDs: 
[32m
[32mZIPCODE
[32m
[39m column of `target`
#> 
[36mℹ
[39m map: intersections between geometries
#> 

[36mℹ
[39m mapping...


[32m✔
[39m done mapping 
[38;5;249m[3.5s]
[39m
#> 
[36mℹ
[39m weights: 
[32m
[32m1
[32m
[39m
#> 
[36mℹ
[39m 
[32m
[32m25
[32m
[39m of 
[32m
[32m80
[32m
[39m target IDs were dropped because they were not present in `map`
#> 
[36mℹ
[39m redistributing 
[32m
[32m116
[32m
[39m variables from 
[32m
[32m336407
[32m
[39m sources to 
[32m
[32m55
[32m
[39m targets:
#> 
[36m•
[39m (numb; 116)
#> 

[36mℹ
[39m aggregating...


[32m✔
[39m done aggregating 
[38;5;249m[9.4s]
[39m
#> 
[36mℹ
[39m realigning with original target IDs

# since it's provided in this case, we can also just aggregate
# up from parcels directly
parcel_zipcode_population <- redistribute(
  parcels, zipcodes,
  source_id = "PIN", target_id = "ZIPCODE", verbose = TRUE
)
#> 
[36mℹ
[39m source IDs: 
[32m
[32mPIN
[32m
[39m column of `source`
#> 
[36mℹ
[39m target IDs: 
[32m
[32mZIPCODE
[32m
[39m column of `target`
#> 
[36mℹ
[39m map: intersections between geometries
#> 

[36mℹ
[39m mapping...


[32m✔
[39m done mapping 
[38;5;249m[3.3s]
[39m
#> 
[36mℹ
[39m weights: 
[32m
[32m1
[32m
[39m
#> 
[36mℹ
[39m 
[32m
[32m25
[32m
[39m of 
[32m
[32m80
[32m
[39m target IDs were dropped because they were not present in `map`
#> 
[36mℹ
[39m redistributing 
[32m
[32m7
[32m
[39m variables from 
[32m
[32m336407
[32m
[39m sources to 
[32m
[32m55
[32m
[39m targets:
#> 
[36m•
[39m (numb; 5) OBJECTID, PARCE_ID, CURRE_POPUL, LOW_ESTIM_POPUL, HIGH_ESTIM_POPUL
#> 
[36m•
[39m (char; 2) VALID_FROM, VALID_TO
#> 

[36mℹ
[39m aggregating...


[32m✔
[39m done aggregating 
[38;5;249m[9.5s]
[39m
#> 
[36mℹ
[39m re-converting categorical levels
#> 
[36mℹ
[39m realigning with original target IDs

Now we can compare the estimated total population of these different methods with those provided:

all_values <- c(
  zipcodes$POPULATION, zipcode_population$Overall,
  bg_parcel_zipcode_population$Overall, parcel_zipcode_population$CURRE_POPUL
)
pal_zip <- colorNumeric(scico::scico(255, palette = "lajolla"), all_values)
map |>
  addPolygons(
    data = zipcodes,
    fillColor = ~ pal_zip(POPULATION), fillOpacity = .8, weight = 1,
    color = "#000", highlightOptions = highlightOptions(color = "#fff"),
    group = "Zip Codes", label = ~ paste(
      ZIPCODE, "Population:", round(POPULATION, 3)
    )
  ) |>
  hideGroup("Zip Codes") |>
  addPolygons(
    data = parcel_zipcode_population,
    fillColor = ~ pal_zip(CURRE_POPUL), fillOpacity = .8, weight = 1,
    color = "#000", highlightOptions = highlightOptions(color = "#fff"),
    group = "Parcels -> Zip Codes", label = ~ paste(
      id, "Population:", round(CURRE_POPUL, 3)
    )
  ) |>
  hideGroup("Parcels -> Zip Codes") |>
  addPolygons(
    data = zipcode_population,
    fillColor = ~ pal_zip(Overall), fillOpacity = .8, weight = 1,
    color = "#000", highlightOptions = highlightOptions(color = "#fff"),
    group = "Block Groups -> Zip Codes", label = ~ paste(
      id, "Population:", round(Overall, 3)
    )
  ) |>
  hideGroup("Block Groups -> Zip Codes") |>
  addPolygons(
    data = bg_parcel_zipcode_population,
    fillColor = ~ pal_zip(Overall), fillOpacity = .8, weight = 1,
    color = "#000", highlightOptions = highlightOptions(color = "#fff"),
    group = "Block Groups -> Parcels -> Zip Codes", label = ~ paste(
      id, "Population:", round(Overall, 3)
    )
  ) |>
  showGroup("Block Groups -> Parcels -> Zip Codes") |>
  addLegend("bottomright", pal_zip, all_values, opacity = 1, title = "Zip Codes")